Search CORE

21 research outputs found

The Pulse of News in Social Media: Forecasting Popularity

Author: Asur Sitaram
Bandari Roja
Huberman Bernardo A.
Publication venue
Publication date: 01/01/2012
Field of study

News articles are extremely time sensitive by nature. There is also intense competition among news items to propagate as widely as possible. Hence, the task of predicting the popularity of news items on the social web is both interesting and challenging. Prior research has dealt with predicting eventual online popularity based on early popularity. It is most desirable, however, to predict the popularity of items prior to their release, fostering the possibility of appropriate decision making to modify an article and the manner of its publication. In this paper, we construct a multi-dimensional feature space derived from properties of an article and evaluate the efficacy of these features to serve as predictors of online popularity. We examine both regression and classification algorithms and demonstrate that despite randomness in human behavior, it is possible to predict ranges of popularity on twitter with an overall 84% accuracy. Our study also serves to illustrate the differences between traditionally prominent sources and those immensely popular on the social web

arXiv.org e-Print Archive

CiteSeerX

Blind Men and the Elephant: Detecting Evolving Groups In Social News

Author: Bandari Roja
Rahmandad Hazhir
Roychowdhury Vwani P.
Publication venue
Publication date: 28/06/2013
Field of study

We propose an automated and unsupervised methodology for a novel summarization of group behavior based on content preference. We show that graph theoretical community evolution (based on similarity of user preference for content) is effective in indexing these dynamics. Combined with text analysis that targets automatically-identified representative content for each community, our method produces a novel multi-layered representation of evolving group behavior. We demonstrate this methodology in the context of political discourse on a social news site with data that spans more than four years and find coexisting political leanings over extended periods and a disruptive external event that lead to a significant reorganization of existing patterns. Finally, where there exists no ground truth, we propose a new evaluation approach by using entropy measures as evidence of coherence along the evolution path of these groups. This methodology is valuable to designers and managers of online forums in need of granular analytics of user activity, as well as to researchers in social and political sciences who wish to extend their inquiries to large-scale data available on the web.Comment: 10 pages, icwsm201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Predicting Rising Follower Counts on Twitter Using Profile Information

Author: Bandari Roja
Gaudeul Alexia
Kaiser Astrid
Noro Tomoya
Oliver J. Eric
Razis Gerasimos
Srinivasan M. S.
Tsur Oren
Twitter
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/05/2017
Field of study

When evaluating the cause of one's popularity on Twitter, one thing is considered to be the main driver: Many tweets. There is debate about the kind of tweet one should publish, but little beyond tweets. Of particular interest is the information provided by each Twitter user's profile page. One of the features are the given names on those profiles. Studies on psychology and economics identified correlations of the first name to, e.g., one's school marks or chances of getting a job interview in the US. Therefore, we are interested in the influence of those profile information on the follower count. We addressed this question by analyzing the profiles of about 6 Million Twitter users. All profiles are separated into three groups: Users that have a first name, English words, or neither of both in their name field. The assumption is that names and words influence the discoverability of a user and subsequently his/her follower count. We propose a classifier that labels users who will increase their follower count within a month by applying different models based on the user's group. The classifiers are evaluated with the area under the receiver operator curve score and achieves a score above 0.800.Comment: 10 pages, 3 figures, 8 tables, WebSci '17, June 25--28, 2017, Troy, NY, US

arXiv.org e-Print Archive

Crossref

An Automated Pipeline for Character and Relationship Extraction from Readers' Literary Book Reviews on Goodreads.com

Author: Bandari Roja
Ebrahimzadeh Ehsan
Falahi Misagh
Holur Pavan
Roychowdhury Vwani
Shahbazi Behnam
Shahsavari Shadi
Tangherlini Timothy R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/04/2020
Field of study

Reader reviews of literary fiction on social media, especially those in persistent, dedicated forums, create and are in turn driven by underlying narrative frameworks. In their comments about a novel, readers generally include only a subset of characters and their relationships, thus offering a limited perspective on that work. Yet in aggregate, these reviews capture an underlying narrative framework comprised of different actants (people, places, things), their roles, and interactions that we label the "consensus narrative framework". We represent this framework in the form of an actant-relationship story graph. Extracting this graph is a challenging computational problem, which we pose as a latent graphical model estimation problem. Posts and reviews are viewed as samples of sub graphs/networks of the hidden narrative framework. Inspired by the qualitative narrative theory of Greimas, we formulate a graphical generative Machine Learning (ML) model where nodes represent actants, and multi-edges and self-loops among nodes capture context-specific relationships. We develop a pipeline of interlocking automated methods to extract key actants and their relationships, and apply it to thousands of reviews and comments posted on Goodreads.com. We manually derive the ground truth narrative framework from SparkNotes, and then use word embedding tools to compare relationships in ground truth networks with our extracted networks. We find that our automated methodology generates highly accurate consensus narrative frameworks: for our four target novels, with approximately 2900 reviews per novel, we report average coverage/recall of important relationships of > 80% and an average edge detection rate of >89\%. These extracted narrative frameworks can generate insight into how people (or classes of people) read and how they recount what they have read to others

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Gestalt Computing and the Study of Content-oriented User Behavior on the Web

Author: Bandari Roja
Publication venue: eScholarship, University of California
Publication date: 01/01/2013
Field of study

Elementary actions online establish an individual's existence on the web and her/his orientation toward different issues. In this sense, actions truly define a user in spaces like online forums and communities and the aggregate of elementary actions shape the atmosphere of these online spaces. This observation, coupled with the unprecedented scale and detail of data on user actions on the web, compels us to utilize them in understanding collective human behavior. Despite large investments by industry to capture this data and the expanding body of research on big data<\italic> in academia, gaining insight into collective user behavior online has been elusive. If one is indeed able to overcome the considerable computational challenges posed by both the scale and the inevitable noisiness of the associated data sets, one could provide new automated frameworks to extract insights into evolving behavior at different scales, and to form an altogether different perspective of aggregated elementary user actions. This thesis addresses this fundamental and pressing problem and offers a gestalt computing<\italic> approach when studying complex social phenomena in large datasets. This approach involves extracting macro structures from aggregated user actions, finding their possible meanings, and arranging data in layers so that it is iteratively explorable. The dissertation includes three major sections; first modeling and prediction of diffusion of information by users on the social web; next, detection of topics promoted by user communities; finally, presentation of the gestalt computing framework through a methodology that uses graph theory, language processing, and information theory to provide a top-down map of group dynamics on social news websites. What we find is not only statistical significance in the extracted structure, but also that the results are meaningful to human understanding. The efficacy of the proposed methodologies is established via multiple real-world data sets

Ezid

eScholarship - University of California

A Resistant Strain: Revealing the Online Grassroots Rise of the Antivaccination Movement

Author: Bandari Roja,
Publication venue
Publication date: 05/12/2017
Field of study

Ezid

Recommended from our members

Communication vs. Performance in Source Localization

Author: Bandari Roja
Pottie Gregory
Publication venue: eScholarship, University of California
Publication date: 10/10/2007
Field of study

Acoustic source localization often requires the transmission of full received waveforms to a fusion center. Using these waveforms the location of a source can be estimated by different methods such as Beamforming, MUSIC, or AML. In either of these cases, a large number or bits is communicated to the fusion center. When communication has to be done in a wireless manner, a considerable amount of energy is expended and where power is not readily available, this can result in shortening the lifetime of the system. We are interested in investigating how much accuracy is lost by reducing the number of bits transmitted by each sensor. This poster demostrates a study of the tradeoffs between localization performance and number of bits transmitted. A few cases were simulated where sensors have a capability of measuring signal power and can transmit only one bit in one case and two bits in another case

eScholarship - University of California

Recommended from our members

Communication vs. Performance in Source Localization

Author: Bandari Roja
Pottie Gregory
Publication venue: eScholarship, University of California
Publication date: 10/10/2007
Field of study

eScholarship - University of California